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Introduction 

After  a statement  describing  the  over-all  goals  and  personnel  of 
this  research  project,  this  final  report  contains:  a list  of  technical  reports 
on  research  supported  by  this  project,  and  a description  of  research 
accomplishments  as  given  in  the  abstracts  or  introductions  of  the  technical 
reports  which  have  been  issued. 


Goals 


This  research  has  developed  a general  approach  to  statistical  data 
analysis  (in  particular  to  non -parametric  statistical  data  modeling  and  to 
robust  analysis  and  modeling  of  statistical  data,  including  the  one -sample, 
two- sample,  bivariate- sample  and  multivariate- sample  cases). 

The  new  results  being  obtained  seem  to  be  attracting  wide  interest: 
(1)  Professor  Parzen's  paper  "Nonparametric  Statistical  Data  Modeling" 
is  a major  invited  address  at  the  August  1978  Annual  Meeting  of  the 
American  Statistical  Association  and  will  be  published  with  discussion  in 
the  December  1978  issue  of  the  Journal  of  the  American  Statistical 
Association;  (2)  Professor  Parzen’s  paper  "A  Density- Quantile  Function 
Perspective  on  Robust  Estimation"  was  given  at  the  April  1978  ARO 


Symposium  on  Robust  Estimation  and  will  be  published  in  its  proceedings 
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NON  PARAMETRIC  STATISTICAL  DATA  SCIENCE: 

A UNIFIED  APPROACH  BASED  ON  DENSITY  ESTIMATION 
AND  TESTING  FOR  "WHITE  NOISE" 

by 

Emanuel  Parzen 

The  aim  of  this  paper  is  to  introduce  a single  canonical  problem  to 
which  one  can  transform  many  basic  statistical  inference  and  statistical 
data  analysis  problems.  This  canonical  problem  is  most  simply  described 
as  the  problem  of  testing  for  white  noise  via  density  estimation  or 
smoothing.  We  first  state  some  of  the  inference  problems  which  we  seek 
to  unify. 

One-sample  (univariate)  inference  problems.  Let  Xj 

be  i.  i.d.  (independent  identically  distributed)  random  variables  with  common 
a.c.  (absolutely  continuous)  d.  f.  (distribution  function)  F(x)  and 
probability  density  function  f(x)  . One  seeks  to  efficiently: 

(i)  estimate  f(x)  non-parametrically  (without  making  any 
prior  assumption  about  its  functional  form) 

(ii)  test  for  a specified  probability  density  fg(x)  whether  there 
exists  constants  H and  0 such  that 


(iii)  estimate  the  parameters  |i  and  0 (called  location  and 


scale  parameters). 


T 


f 


Two- sample  (univariate)  inference  problems.  Let  X ^ , . . . , Xrj 
be  i.  i.  d.  with  common  a.  c.  d.  f.  F(x)  and  let  Y j * • • • • be  ^ ^ ^ 
with  common  a.c.  d.  f.  G(x)  . One  seeks  to  efficiently: 

(i)  test  whether  there  exists  constants  p and  O such  that 

c<*>  = '• 

(ii)  estimate  p and  O 

One -sample  multivariate  inference  problems.  Let 


be  a random  vector  with  absolutely  continuous  multivariate  distribution 

function  F(Xj xrf)  and  density  ^.....x^  ; let  Xj 2^ 

be  a random  sample.  One  seeks  to  efficiently: 

(i)  test  whether  the  components  X^, . . . , X^  are  independent 

random  variables, 

(ii)  estimate  the  multivariate  density  f , 

(iii)  estimate  the  regression  function 

H(Xj » • • • • *d_ j ) = E(Xd|XJ  = xj Xd-1  = xd  ]]  . 

In  addition,  there  are  multi- sample  univariate  inference  problems 
and  multi-sample  multivariate  inference  problems  concerned  with  the  equal- 


ity of  many  distributions;  however,  they  are  not  discussed  in  this  paper. 
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NEW  NONPARAMETRIC  APPROACH  TO  THE 
TWO-SAMPLE  PROBLEM 

by 

Jean-  Pierre  Carmichael 
and 

Emanuel  Parzen 


ABSTRACT 

Given  two  random  samples  (Xj,...,Xm)  and  (Yj,...,Yn)  , 
we  want  to  test  the  hypothesis  that  F^,(*)  = Fy(»)  . There  are  different 
possible  alternatives.  Here  we  are  mostly  concerned  about  change  of 

[ 

location: 

F (x)  = F (x  - M)  • 

Y X 

j 

In  Chapter  1,  we  review  the  classical  parametric  and  non-para- 
metric  procedures  that  are  currently  used.  In  Chapter  2,  we  introduce 
some  new  test  statistics  obtained  from  Parzen's  new  formulation  of  the 
problem  (1977).  In  Chapter  3,  we  present  the  results  of  simulations 
comparing  these  different  procedures  on  a wide  range  of  underlying  dis- 
tributions. In  Chapter  4,  we  document  the  use  of  a computer  package 
developed  here,  including  some  new  graphical  displays. 
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NON PARAMETRIC  STATISTICAL  DATA  MODEUNG 

by 

Emanuel  Parieen 


Introduction 


It  is  the  aim  of  this  paper  to  introduce  new  types  of  keys  for 
exploratory  data  analysis  (of  continuous  data)  based  on  estimating  the 
quantile  function  and  density  quantile  function.  It  appears  that  this 
approach  leads  to  an  exploratory  data  analysis  which  lias  a firm  prob- 
ability base.  Consequently  the  distinction  between  exploratory  and 
confirmatory  data  analysis  can  be  regarded  as  a distinction  between 
confirmatory  non -para  metric  statistical  data  analysis  or  modeling,  and 
confirmatory  parametric  statistical  data  analysis. 

The  basic  proposition  of  this  paper  is  that  exploratory  data  analysis 
and  conventional  parametric  statistical  inference  both  have  as  their  aim 
the  estimation  of  the  quantile  function  Q(u)  , 0 < u £ 1 , of  a random 


variable  X of  which  the  data  X. X are  independent  (or  dependent) 

In  * 


observations.  To  estimate  Q , one  assumes  a representation  tor  it  of 
the  form 


Q(u)  = p + 0 Qq(u)  . 


which  is  equivalent  to  the  classic  location  and  scale  parameter  model  for 
the  probability  density  function:  t'(x)  = ^ f^ ) • We  call  this 


representation  hypothesis  HQ  . One  can  distinguish  four  stages  of  this 
model. 


1.  Parametric  model;  one  assumes  Known.  Then  one's 

aim  is  to  estimate  p and  o . One  uses  either  maximum  likelihood 

estimation  or  optimal  linear  combinations  of  order  statistics. 

II.  Goodness  of  fits  one  <ests  IP  for  various  specifications  of 
1 - — ■ ■ ■■■  u 

Qq  (corresponding  to  the  familiar  probability  laws,  such  as  normal, 
exponential,  logistic,  Weibuli,  Pareto,  Cauchy,  ami  so  on). 

” HI.  Robust  parametric  model;  Q is  specified  by  specifications 

which  permit  small  deviations  from  an  Ideal  Model,  such  as  "Q^ 
symmetric  and  possibly  long  tailed"  or  "Q  ^ normal  except  for  con- 
tamination by  outliers.  " 

IV.  Non- paramet ric  model:  estimate  , either  by  estimating 

the  density  quantile  function  t'Q(u)  - f ^Q(u)^  , or  through  suitable  plots 
of  the  sample  quantile  functions  of  transformations  of  the  data. 

The  main  aim  of  this  paper  is  to  introduce  a "density  estimation" 
approach  to  Goodness  of  Fit  tests  which  also  yields  estimations  of  Q . 
To  a specified  hypothesis  IP  ; Q(u)  = M + OQ^(v»)  , one  can  define  a 
density  d(u)  , 0 £ u £ 1 , such  that  is  equivalent  to  d(u)  * 1 . 

Estimation  of  d(u)  provides  a test  of  H ^ and  also  an  estimator  of  the 
true  fQ  function  when  is  rejected.  Many  density  estimation 

methods  are  available;  we  believe  the  "autoregressive"  method  works 
best  for  small  samples,  and  we  describe  it  in  detail. 
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A DENSITY -QUANTILE  FUNCTION  PERSPECTIVE 
ON  ROBUST  ESTIMATION 

by 

Emanuel  Parzen 


A perspective  on  robust  estimation  is  discussed,  with  three  broad 
sets  of  conclusions.  Point  I:  The  means  must  be  justified  by  the  ends. 
Point  II:  Graphical  goodness-of- fit  procedures  should  be  applied  to  data 
to  check  if  they  are  adequately  fitted  by  the  qualitatively  defined  models 
which  are  implicitly  assumed  by  robust  estimation  procedures.  Point  III: 
There  is  a danger  that  researchers  may  regard  robust  regression  pro- 
cedures as  a routine  solution  to  the  problem  of  modeling  relations  between 
variables  without  first  studying  the  distribution  of  each  variable. 

New  tools  introduced  include:  Student's  window;  Quantile- Box 
Plots;  density-quantile  estimation  approach  to  goodness-of-fit  tests; 
and  a definition  of  statistics  as  "arithmetic  done  by  the  method  of 
Lebesgue  integration.  " 


L 


'1 
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TECHNIQUES  OF  QUANTILE  REGRESSION 

by 

Jean- Pierre  Carmichael 


Int  reduction 

Given  observations  {(X.,YJ,  i = l,...,n)  on  random  variables 
(X,  Y)  with  joint  distribution  F (x,  y)  , we  want  to  estimate  the 

A,  1 

repression  function  of  Y on  X , E[YlX  x]  , nonparainet rically. 

In  order  to  find  a natural  estimator  (simple  computationally  and 
intuitively  appealing).  Parpen  (1977)  developed  the  following  theoretical 
approach. 

1.  Theoretical  Approach: 

Let  Uj  = F^.(X)  and  U , = F ,(Y)  , then  the  joint  distribution 
of  Uj  and  U^  is 


U 


, I 4- 


and  their  joint  density  is 


where 


fX,  •'  QY|t,2l) 

'vW) 

is  the  distribution  function  of  Z 
is  its  density  function 


is  its  quantile  function 
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Let  r(x)  be  the  regression  function  of  Y on  X = x 


„ y fx  Y(x,y)  dy 


r(x)  = E[Y|X  = x]  = J 


We  now  define  the  regression-quantile  function  rQ  (•)  by 


rQ(u)  = r^Qx(u)]  = E[Y|X  = Qx(u)] 


How  do  we  compute  rQ(*)  ? 


By  definition. 


rU(u)  - J’ 


y fx.  v(Qx(u)-y  Jy) 
fx(Qx(u)) 


Let  y = Qy^u2^  * thon 


rQ(u)  = QY(U2)dU1,U2(U’U2)du2 


If  we  introduce  a Dirac  delta  function,  we  can  express  rQ(*)  as  a 


double  integral 


rQ<u>  = /J  /J  qy(u2)  6(ui  • u)  d DU  , U (U1’U2) 


r 2 


We  estimate  rQ(*)  by 


I*2  rQ(u)  " Jo  Jo  QY(U2)  h(n)  K ( li(n)  ) d % , U,(U1 ' U2) 


r 2 


